{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training a neural network to solve a regression problem \n",
    "\n",
    "Consider an e-commerce company that's trying to solve the problem of demand prediction - they'd like to estimate the number of mobile phones that are likely to be purchased in the upcoming week so that they can plan their inventory accordingly. Our goal is to develop a model that can make such a prediction. Let's assume that the demand for a given week is a function of 3 variables - (a) number of mobile phones sold in the previous week, (b) discounts offered, and (c) number of weeks to the next festival. Let's call these variables $prev\\_week\\_sales$, $discount\\_fraction$ and $weeks\\_to\\_next\\_festival$ respectively. This problem can be modelled as a regression problem wherein we predict the number of mobile phones sold in the upcoming week from an input vector of the form [$prev\\_week\\_sales$, $discount\\_fraction$,  $weeks\\_to\\_next\\_festival$]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<torch._C.Generator at 0x106f73590>"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import torch\n",
    "torch.manual_seed(100)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TwoLayeredNN(torch.nn.Module):\n",
    "    \"\"\"\n",
    "    We define a torch module that represents a simple neural network with 2 hidden layers\n",
    "    \"\"\"\n",
    "    def __init__(self, input_size, hidden1_size, hidden2_size, output_size):\n",
    "        \"\"\"\n",
    "        Args\n",
    "            input_size(int): Number of inputs\n",
    "            hidden1_size (int): Number of neurons in the first hidden layer\n",
    "            hidden2_size(int): Number of neurons in the second hidden layer\n",
    "            output_size(int): Number of hidden layer neurons\n",
    "        \"\"\"\n",
    "        super(TwoLayeredNN, self).__init__()\n",
    "        self.model = torch.nn.Sequential(\n",
    "            torch.nn.Linear(input_size, hidden1_size), # (input x hidden1)\n",
    "            torch.nn.Sigmoid(),\n",
    "            torch.nn.Linear(hidden1_size, hidden2_size), # (hidden1 x hidden2)\n",
    "            torch.nn.Sigmoid(),\n",
    "            torch.nn.Linear(hidden2_size, output_size) # (hidden2 x output)\n",
    "        )\n",
    "        self._initialize_weights()\n",
    "            \n",
    "    def forward(self, X):\n",
    "        return self.model(X)\n",
    "    \n",
    "    def _initialize_weights(self):\n",
    "        for m in self.model.modules():\n",
    "            if isinstance(m, torch.nn.Linear):\n",
    "                torch.nn.init.xavier_uniform(m.weight.data)\n",
    "                torch.nn.init.constant_(m.bias.data, 0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the purposes of demonstration, let us construct a toy dataset $X$ having $n$ samples that comprises of the 3 features ($prev\\_week\\_sales$, $discount\\_fraction$ and $weeks\\_to\\_next\\_festival$). We will also generate the ground truth $\\vec{y}$ representing the current week sales as a random non linear function of the input variables. Our task will now be to build a neural network that is able to learn this function from the given dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Let us artificially generate the training data\n",
    "n = 10000\n",
    "\n",
    "# Sales of previous week\n",
    "X_prev_week_sales = torch.randint(low=10000, high=90000, size=(n,)).float().unsqueeze(1)\n",
    "# Discount offered\n",
    "X_discount_fraction = torch.randint(low=0, high=80, size=(n,)).float().unsqueeze(1)\n",
    "# Number of weeks to the next festival\n",
    "X_weeks_to_next_festival = torch.randint(low=0, high=10, size=(n,)).float().unsqueeze(1)\n",
    "\n",
    "# X is a Nx3 matrix represtenting our toy dataset\n",
    "X = torch.cat((X_prev_week_sales, X_discount_fraction, X_weeks_to_next_festival), dim=1)\n",
    "\n",
    "# y is the ground truth vector which we generate as an arbitrary function of the input variables\n",
    "y = torch.ceil(1.2 * X_prev_week_sales + \\\n",
    "    (X_discount_fraction / 100).pow(2) * 5000 + \\\n",
    "    (10 - X_weeks_to_next_festival).pow(3) + \\\n",
    "    100 * torch.randn(X_prev_week_sales.shape)) # We also add in some random Gaussian noise"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[7.6440e+04, 6.3000e+01, 2.0000e+00],\n",
       "        [4.1512e+04, 5.0000e+01, 3.0000e+00],\n",
       "        [7.7395e+04, 7.7000e+01, 9.0000e+00],\n",
       "        ...,\n",
       "        [2.1532e+04, 7.0000e+01, 4.0000e+00],\n",
       "        [3.0035e+04, 5.0000e+00, 2.0000e+00],\n",
       "        [1.8894e+04, 1.9000e+01, 4.0000e+00]])"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([[94182.],\n",
       "        [51531.],\n",
       "        [95938.],\n",
       "        ...,\n",
       "        [28559.],\n",
       "        [36599.],\n",
       "        [23056.]])"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the range of values for each of the features is completely different. $prev\\_week\\_sales$ is in the order of thousands of units sold, $discount\\_fraction$ is in the order of 1 - 100 and $weeks\\_to\\_next\\_festival$ is the in the order of 1 - 10 weeks.  In machine learning, it is generally a good practice to bring all the values to a common scale because it can help improve the speed of training and also reduce the chance of getting stuck at a local minima."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def min_max_norm(X, y):\n",
    "    X, y = X.clone(), y.clone()\n",
    "    X_min, X_max = torch.min(X, dim=0)[0], torch.max(X, dim=0)[0]\n",
    "    X = (X - X_min) / (X_max - X_min)\n",
    "    y_min, y_max = torch.min(y, dim=0)[0], torch.max(y, dim=0)[0]\n",
    "    y = (y - y_min) / (y_max - y_min)\n",
    "    return X, y\n",
    "X_norm, y_norm = min_max_norm(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tensor([1., 1., 1.])"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_norm.max(dim=0)[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To solve the regression problem, let's first define a 2-layer Neural Network model that can take in $3d$ input vectors of the form [$prev\\_week\\_sales$, $discount\\_fraction$, $is\\_festival\\_ongoing$] and generate output predictions. The simple neural network will contain 2 hidden layers with 10 and 5 neurons respectively."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/var/folders/tx/pqtg9xkd5156pd7hbgdr_4140000gn/T/ipykernel_75984/911663540.py:29: UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.\n",
      "  torch.nn.init.xavier_uniform(m.weight.data)\n"
     ]
    }
   ],
   "source": [
    "# Let us create the neural network to do regression. \n",
    "# input_size = number of input features\n",
    "# hidden1_size = 10\n",
    "# hidden2_size = 5\n",
    "# output_size = 1 because we are regressing to a given value\n",
    "\n",
    "nn = TwoLayeredNN(input_size=X_norm.shape[-1],\n",
    "                  hidden1_size=10,\n",
    "                  hidden2_size=5,\n",
    "                  output_size=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We want a loss function that compares the demand predicted by the neural network model with the actual demand from the ground truth, and returns larger values when the difference is higher and smaller values when the difference is lower. Mean squared error is one such loss that is readily available in PyTorch through the $torch.nn.MSELoss$ class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "loss = torch.nn.MSELoss() # Mean squared error"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "During training, we iteratively run the forward pass, compute the loss, calculate gradients and update the weights. The neural network is initialized with random weights and hence makes arbitrary predictions for the demand in the early iterations of the training loop. This translates to a high initial loss value. However, as the training proceeds, the weights are updated so that the loss value is minimized, and the predicted demand comes closer to the actual ground truth. \n",
    "\n",
    "To update the weights, we use what is known as an optimizer. Here, we use the Stochastic Gradient Descent based optimizer which can be invoked using $torch.optim.SGD$. PyTorch offers various optimizers  which will be discussed in detail in the next chapter.\n",
    "\n",
    "We typically run the training loop until the loss reaches a low enough value that is acceptable. Once the training loop completes, we have a model that can readily take in new data points and generate output predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Step: 0 Loss: 0.08074740320444107\n",
      "Step: 1000 Loss: 2.626368950586766e-05\n",
      "Step: 2000 Loss: 2.1405438019428402e-05\n",
      "Step: 3000 Loss: 1.9865865397150628e-05\n",
      "Step: 4000 Loss: 1.8848664694814943e-05\n",
      "Step: 4999 Loss: 1.8041795556200668e-05\n"
     ]
    }
   ],
   "source": [
    "optimizer = torch.optim.SGD(nn.parameters(), lr=0.3, momentum=0.9)\n",
    "num_iters = 5000\n",
    "for i in range(num_iters):\n",
    "    # Forward pass\n",
    "    y_out = nn(X_norm)\n",
    "    # Compute mean squared loss\n",
    "    mse_loss = loss(y_out, y_norm)\n",
    "    if i % 1000 == 0:\n",
    "        print(f\"Step: {i} Loss: {mse_loss}\")\n",
    "    # Clear gradients\n",
    "    optimizer.zero_grad()  \n",
    "    # Backpropogation\n",
    "    mse_loss.backward()\n",
    "    optimizer.step()\n",
    "print(f\"Step: {i} Loss: {mse_loss}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
